geotiff: pin validated IP into HTTP source to close DNS-rebind TOCTOU#1853
Merged
Conversation
…#1846) `_validate_http_url` resolves the hostname and rejects private IPs at construction (and on each redirect). urllib3 then resolves the hostname a *second* time at connect-time. A hostile DNS server can return a public IP to the validator and a private IP at connect, slipping past the SSRF guard (the existing comment honestly labels the check "best-effort" for this reason). Fix: `_validate_http_url` now returns the first validated public IP. `_HTTPSource` builds a per-hop urllib3 `HTTP[S]ConnectionPool` whose custom `_PinnedHTTP[S]Connection._new_conn` dials that exact IP via `socket.create_connection` without re-consulting DNS. `self.host` stays set to the original hostname (so the HTTP Host header and TLS SNI / cert verification still use the name), and `server_hostname` is passed to `HTTPSConnectionPool` so cert hostname checks run against the name, not the IP. Pools are cached per `(scheme, host, port, ip)` tuple on the source so range requests against the same hop reuse TCP/TLS. Each redirect target is freshly re-validated and re-pinned (a hop to a new host gets its own pool). The escape hatch `XRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS=1` returns `None` from the validator and falls back to the shared `PoolManager`, so localhost dev/test flows stay working. Residual scope: within a single hop the IP is pinned; across redirects each hop is independently resolved and validated. An attacker who legitimately owns multiple public IPs on a hostname can influence which one we pick (we take the first); they cannot redirect us to a private IP. Tests: - Rebinding scenario: validator sees 93.184.216.34; subsequent `getaddrinfo` calls return 127.0.0.1. The intercepted `socket.create_connection` confirms the TCP target is the validated public IP. - Host header and SNI stay set to the original hostname. - Redirect to a different safe host re-resolves and re-validates the new hostname. - Existing redirect-to-private guard still fires. Closes #1846.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR hardens the GeoTIFF HTTP range reader against DNS-rebinding / TOCTOU by pinning the validated (public) IP address into the actual TCP connection, so urllib3 can’t re-resolve the hostname to a different (potentially private) IP at connect-time.
Changes:
- Update
_validate_http_url()to return the first validated public IP (orNonewhenXRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS=1is set) so callers can pin connections. - Add pinned-IP urllib3
HTTP[S]Connection+ per-hopHTTP[S]ConnectionPoolconstruction/caching in_HTTPSource. - Add a new test module covering validator behavior, TCP target pinning, host/SNI preservation, and redirect re-validation.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
xrspatial/geotiff/_reader.py |
Implements IP pinning by introducing custom urllib3 connection/pool logic and wiring it into _HTTPSource’s redirect-following request loop. |
xrspatial/geotiff/tests/test_dns_rebinding_pin_issue_1846.py |
Adds targeted regression tests for DNS-rebinding prevention and hop-by-hop re-validation behavior. |
Comments suppressed due to low confidence (1)
xrspatial/geotiff/_reader.py:921
- When
pinned_ipis set,_pool_for_requestcan return aurllib3.HTTPConnectionPool/HTTPSConnectionPool, but_requestalways callspool.request('GET', current_url, ...)with an absolute URL.ConnectionPool.request()expects a request target/path (e.g./cog.tif?x=1), not a fullhttps://host/...URL; passing an absolute URL typically produces an invalid origin-form request line (or a path like/https://host/...) and can break real HTTP range reads on the pinned path. Consider deriving the request target fromurlparse(current_url)(path + optional?query) when using a per-host pool, while keeping the absolute URL for thePoolManagerpath.
pool = self._pool_for_request(current_url, current_pin)
resp = pool.request(
'GET', current_url,
headers=headers,
timeout=timeout,
redirect=False,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #1846.
_validate_http_urlresolves the hostname viasocket.getaddrinfoand rejects any URL that resolves to a private / loopback / link-local IP. urllib3 then resolves the hostname a second time at connect-time. A hostile DNS server can return a public IP to the validator and a private IP at connect (classic DNS rebinding / TOCTOU), defeating the SSRF guard. The existing code comment honestly labels the check "best-effort" for this reason.Fix
_validate_http_urlnow returns the first validated public IP (orNonewhen the escape hatch is set)._HTTPSourcebuilds a per-hop urllib3HTTP[S]ConnectionPoolwhose custom_PinnedHTTP[S]Connection._new_conndials that exact IP viasocket.create_connection, never re-consulting DNS.self.hoststays the original hostname so the HTTPHostheader (virtual hosting) and TLS SNI / cert verification still use the name, not the IP.server_hostnameis passed toHTTPSConnectionPoolso HTTPS certs are still validated against the original hostname.(scheme, host, port, ip)tuple on the source. Range requests against the same hop reuse TCP/TLS; redirect hops to a different host get their own pool.Residual scope
XRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS=1returnsNonefrom the validator and falls back to the sharedPoolManager. Localhost dev/test flows still work, with the trade-off being no pin (same as before).Test plan
test_dns_rebinding_pin_issue_1846.py(9 tests):Nonewhen the escape hatch is on._HTTPSource.__init__records the pinned IP.getaddrinforeturns93.184.216.34to the validator, then127.0.0.1afterwards. The interceptedsocket.create_connectionconfirms the TCP target is93.184.216.34.test_ssrf_hardening_1664.py(46 tests) still pass.mainand unrelated -- matplotlib/Python 3.14 deepcopy recursion intest_features.py, GPU predictor tests, and a tile-size validation test).Refs #1664.